22 research outputs found

    An Almost Constant Lower Bound of the Isoperimetric Coefficient in the KLS Conjecture

    Full text link
    We prove an almost constant lower bound of the isoperimetric coefficient in the KLS conjecture. The lower bound has the dimension dependency dod(1)d^{-o_d(1)}. When the dimension is large enough, our lower bound is tighter than the previous best bound which has the dimension dependency d1/4d^{-1/4}. Improving the current best lower bound of the isoperimetric coefficient in the KLS conjecture has many implications, including improvements of the current best bounds in Bourgain's slicing conjecture and in the thin-shell conjecture, better concentration inequalities for Lipschitz functions of log-concave measures and better mixing time bounds for MCMC sampling algorithms on log-concave measures.Comment: 25 pages, 1 figure, accepted in GAFA journa

    When does Metropolized Hamiltonian Monte Carlo provably outperform Metropolis-adjusted Langevin algorithm?

    Full text link
    We analyze the mixing time of Metropolized Hamiltonian Monte Carlo (HMC) with the leapfrog integrator to sample from a distribution on Rd\mathbb{R}^d whose log-density is smooth, has Lipschitz Hessian in Frobenius norm and satisfies isoperimetry. We bound the gradient complexity to reach ϵ\epsilon error in total variation distance from a warm start by O~(d1/4polylog(1/ϵ))\tilde O(d^{1/4}\text{polylog}(1/\epsilon)) and demonstrate the benefit of choosing the number of leapfrog steps to be larger than 1. To surpass previous analysis on Metropolis-adjusted Langevin algorithm (MALA) that has O~(d1/2polylog(1/ϵ))\tilde{O}(d^{1/2}\text{polylog}(1/\epsilon)) dimension dependency in Wu et al. (2022), we reveal a key feature in our proof that the joint distribution of the location and velocity variables of the discretization of the continuous HMC dynamics stays approximately invariant. This key feature, when shown via induction over the number of leapfrog steps, enables us to obtain estimates on moments of various quantities that appear in the acceptance rate control of Metropolized HMC. Moreover, to deal with another bottleneck on the HMC proposal distribution overlap control in the literature, we provide a new approach to upper bound the Kullback-Leibler divergence between push-forwards of the Gaussian distribution through HMC dynamics initialized at two different points. Notably, our analysis does not require log-concavity or independence of the marginals, and only relies on an isoperimetric inequality. To illustrate the applicability of our result, several examples of natural functions that fall into our framework are discussed.Comment: 42 page

    Fast and Robust Archetypal Analysis for Representation Learning

    Get PDF
    We revisit a pioneer unsupervised learning technique called archetypal analysis, which is related to successful data analysis methods such as sparse coding and non-negative matrix factorization. Since it was proposed, archetypal analysis did not gain a lot of popularity even though it produces more interpretable models than other alternatives. Because no efficient implementation has ever been made publicly available, its application to important scientific problems may have been severely limited. Our goal is to bring back into favour archetypal analysis. We propose a fast optimization scheme using an active-set strategy, and provide an efficient open-source implementation interfaced with Matlab, R, and Python. Then, we demonstrate the usefulness of archetypal analysis for computer vision tasks, such as codebook learning, signal classification, and large image collection visualization

    A Simple Proof of the Mixing of Metropolis-Adjusted Langevin Algorithm under Smoothness and Isoperimetry

    Full text link
    We study the mixing time of Metropolis-Adjusted Langevin algorithm (MALA) for sampling a target density on Rd\mathbb{R}^d. We assume that the target density satisfies ψμ\psi_\mu-isoperimetry and that the operator norm and trace of its Hessian are bounded by LL and Υ\Upsilon respectively. Our main result establishes that, from a warm start, to achieve ϵ\epsilon-total variation distance to the target density, MALA mixes in O((LΥ)12ψμ2log(1ϵ))O\left(\frac{(L\Upsilon)^{\frac12}}{\psi_\mu^2} \log\left(\frac{1}{\epsilon}\right)\right) iterations. Notably, this result holds beyond the log-concave sampling setting and the mixing time depends on only Υ\Upsilon rather than its upper bound LdL d. In the mm-strongly logconcave and LL-log-smooth sampling setting, our bound recovers the previous minimax mixing bound of MALA~\cite{wu2021minimax}.Comment: 16 page

    Domain adaptation under structural causal models

    Full text link
    Domain adaptation (DA) arises as an important problem in statistical machine learning when the source data used to train a model is different from the target data used to test the model. Recent advances in DA have mainly been application-driven and have largely relied on the idea of a common subspace for source and target data. To understand the empirical successes and failures of DA methods, we propose a theoretical framework via structural causal models that enables analysis and comparison of the prediction performance of DA methods. This framework also allows us to itemize the assumptions needed for the DA methods to have a low target error. Additionally, with insights from our theory, we propose a new DA method called CIRM that outperforms existing DA methods when both the covariates and label distributions are perturbed in the target data. We complement the theoretical analysis with extensive simulations to show the necessity of the devised assumptions. Reproducible synthetic and real data experiments are also provided to illustrate the strengths and weaknesses of DA methods when parts of the assumptions in our theory are violated.Comment: 80 pages, 22 figures, accepted in JML

    Fast MCMC sampling algorithms on polytopes

    Get PDF
    We propose and analyze two new MCMC sampling algorithms, the Vaidya walk and the John walk, for generating samples from the uniform distribution over a polytope. Both random walks are sampling algorithms derived from interior point methods. The former is based on volumetric-logarithmic barrier introduced by Vaidya whereas the latter uses John's ellipsoids. We show that the Vaidya walk mixes in significantly fewer steps than the logarithmic-barrier based Dikin walk studied in past work. For a polytope in Rd\mathbb{R}^d defined by n>dn >d linear constraints, we show that the mixing time from a warm start is bounded as O(n0.5d1.5)\mathcal{O}(n^{0.5}d^{1.5}), compared to the O(nd)\mathcal{O}(nd) mixing time bound for the Dikin walk. The cost of each step of the Vaidya walk is of the same order as the Dikin walk, and at most twice as large in terms of constant pre-factors. For the John walk, we prove an O(d2.5log4(n/d))\mathcal{O}(d^{2.5}\cdot\log^4(n/d)) bound on its mixing time and conjecture that an improved variant of it could achieve a mixing time of O(d2polylog(n/d))\mathcal{O}(d^2\cdot\text{polylog}(n/d)). Additionally, we propose variants of the Vaidya and John walks that mix in polynomial time from a deterministic starting point. The speed-up of the Vaidya walk over the Dikin walk are illustrated in numerical examples.Comment: 86 pages, 9 figures, First two authors contributed equall

    Minimax Mixing Time of the Metropolis-Adjusted Langevin Algorithm for Log-Concave Sampling

    Full text link
    We study the mixing time of the Metropolis-adjusted Langevin algorithm (MALA) for sampling from a log-smooth and strongly log-concave distribution. We establish its optimal minimax mixing time under a warm start. Our main contribution is two-fold. First, for a dd-dimensional log-concave density with condition number κ\kappa, we show that MALA with a warm start mixes in O~(κd)\tilde O(\kappa \sqrt{d}) iterations up to logarithmic factors. This improves upon the previous work on the dependency of either the condition number κ\kappa or the dimension dd. Our proof relies on comparing the leapfrog integrator with the continuous Hamiltonian dynamics, where we establish a new concentration bound for the acceptance rate. Second, we prove a spectral gap based mixing time lower bound for reversible MCMC algorithms on general state spaces. We apply this lower bound result to construct a hard distribution for which MALA requires at least Ω~(κd)\tilde \Omega (\kappa \sqrt{d}) steps to mix. The lower bound for MALA matches our upper bound in terms of condition number and dimension. Finally, numerical experiments are included to validate our theoretical results.Comment: 63 pages, 2 figure
    corecore